WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 



SUES 



lip 



PCT 

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification 6 : 

C12N 15/67, 15/82, 15/29, C07K 14/415 



Al 



(11) International Publication Number: WO 97/46690 

(43) International Publication Date: I I December 1997 (1 1.12.97) 



(21) International Application Number: PCT/GB97/OI 414 

(22) International Filing Date: 23 May 1997 (23.05.97) 



(30) Priority Data: 

9611981.3 



7 June 1996 (07.06.96) 



GB 



(71) Applicant (for all designated States except OS): ZENECA 

LIMITED (GB/GB]; 15 Stanhope Gate, London W1Y 6LN 

(GB). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only): DRAKE, Caroline, Rachel 
(GB/GB); (GB). BIRD, Colin, Roger [GB/GB]; (GB). 
SCHUCH, Wolfgang, Walter [DE/GB); Jealott's Hill Re- 
search Station, Bracknell, Berkshire RG42 6ET (GB). 

(74) Agents: HUSKISSON, Frank, Mackie et at.; Zeneca Agro- 
chemicals, Intellectual Property Dept., Jealott's Hill Re- 
search Station, P.O. Box 3538, Bracknell, Berkshire RG42 
6YA (GB). 



(81) Designated States: AL. AM, AT, AU, AZ, BA, BB, BG. BR. 
BY, CA, CH, CN, CU, CZ, DE, DK, EE, ES, FI, GB, GE. 
HL\ IL, IS, JP. KE. KG, KP, KR. KZ. LC. LK, LR, LS, LT, 
LU, LV, MD, MG, MK, MN, MW, MX, NO, NZ, PL, PT, 
RO, RU, SD, SE, SG, SI, SK, TJ, TM, TR, TT, UA, UG, 
US, UZ, VN, YU, ARIPO patent (GH, KE, LS, MW, SD, 
SZ, UG), Eurasian patent (AM, AZ, BY, KG, KZ. MD, RU, 
TJ, TM), European patent (AT, BE, CH. DE, DK, ES, FI, 
FR, GB, GR, IE, IT, LU, MC, NL, PT, SE), OAPI patent 
(BF, BJ, CF. CG, CI, CM, GA, GN, ML, MR. NE, SN, TD, 
TG). 



Published 

With international search report. 

Before the expiration of the time limit for amending the 
claims and to be republished in the event of the receipt of 
amendments. 
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A method for enhancing the expression of a selected gene in an organism while avoiding or reducing co-suppression involves the 
synthesis of a DNA which is altered in nucleotide sequence and is capable of expression of a protein, ideally identical to that of a 
protein already expressed by a DNA already present in the organism. This method ensures that sequence similarity between the two genes 
is reduced enough to eliminate the phenomenon of co-suppression, allowing the over-expression of a specific protein. The method is 
particularly suitable in plants. 




FOR THE PURPOSES OF INFORMATION ONLY 
Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT. 



AL 


Albania 


ES 


Spain 


LS 


I-esotho 


SI 


Slovenia 


AM 


Armenia 


FI 


Finland 


LT 


Lithuania 


SK 


Slovakia 


AT 


Austria 


FR 


France 


LU 


Luxembourg 


SN 


Senegal 


AU 


Australia 


GA 


Gabon 


LV 


Latvia 


sz 


Swaziland 


AZ 


Azerbaijan 


GB 


United Kingdom 


MC 


Monaco 


TD 


Chad 


BA 


Bosnia and Herzegovina 


GE 


Georgia 


MD 


Republic of Moldova 


TG 


Togo 


BB 


Barbados 


GH 


Ghana 


MG 


Madagascar 


TJ 


Tajikistan 


BR 


Belgium 


GN 


Guinea 


MK 


The former Yugoslav 


TM 


Turkmenistan 


BF 


Burkina Paso 


GR 


Greece 




Republic of Macedonia 


TR 


Turkey 


BG 


Bulgaria 


HL 


Hungary 


ML 


Mali 


TT 


Trinidad and Tobago 


BJ 


Benin 


IE 


Ireland 


MN 


Mongolia 


UA 


Ukraine 


BR 


Brazil 


IL 


Israel 


MR 


Mauritania 


LG 


Uganda 


BY 


Belarus 


IS 


Iceland 


MW 


Malawi 


VS 


United States of America 


CA 


Canada 


IT 


Italy 


MX 


Mex ico 


VZ 


Uzbekistan 


CF 


Central African Republic 


JP 


Japan 


NE 


Niger 


VN 


Viet Nam 


CG 


Congo 


KE 


Kenya 


NL 


Netherlands 


VI 


Yugoslavia 


CH 


Switzerland 


KG 


Kyrgyzstan 


NO 


Norway 


zw 


Zimbabwe 


CI 


Cote d'lvoire 


KP 


Democratic People's 


NZ 


New Zealand 






CM 


Cameroon 




Republic of Korea 


PL 


Poland 






CN 


China 


KR 


Republic of Korea 


PT 


Portugal 






CU 


Cuba 


KZ 


Kazakstan 


RO 


Romania 






CZ 


Czech Republic 


IX 


Saint Lucia 


RL 


Russian Federation 






DE 


Germany 


LI 


Liechtenstein 


SD 


Sudan 






DK 


Denmark 


LK 


Sri Lanka 


SE 


Sweden 






EE 


Estonia 


LR 


Liberia 


SG 


Singapore 







WO 97/46690 



PCT/GB97/01414 



ENHANCEMENT OF GENE EXPRESSION 

This invention relates to a method and material for enhancing gene expression 
in organisms, particularly in plants. One particular, but not exclusive, application of the 
5 invention is the enhancement of caroteniod biosynthesis in plants such as tomato 
(Lycopersicon spp.) 

In order to increase production of a protein by an organism, it is known 
practice to insert into the genome of the target organism one or more additional copies 
of the protein-encoding gene by genetic transformation. Such copies would normally 

10 be identical to a gene which is already present in the plant or, alternatively, they may be 
identical copies of a foreign gene. In theory, multiple gene copies should, on 
expression cause the organism to produce the selected protein in greater than normal 
amounts, this is referred to as "overexpression". Experiments have shown however, 
that low expression or no expression of certain genes can result when multiple copies 

15 of the gene are present. (Napoli et al 1990 and Dorlhac de Borne et al 1994). This 
phenomenon is referred to as co- suppression. It most frequently occurs when 
recombinant genes are introduced into a plant already containing a gene similar in 
nucleotide sequence. It has also been observed in endogenous plant genes and 
transposable elements. The effects of co-suppression are not always immediate and can 

20 be influenced by developmental and environmental factors in the primary transformants 
or in subsequent generations. 

The general rule is to transform plants with a DN A sequence the codon usage 
of which approximates to the codon frequency used by the plant. Experimental analysis 
has shown that introducing a second copy of a gene identical in sequence to a gene 

25 already in the plant genome can result (in some instances) with the expression of the 
transgene, endogenous gene or both genes being inactivated (co-suppression). The 
mechanisms of exactly how co-suppression occurs are unclear, however there are 
several theories incorporating both pre- and post-gene transcriptional blocks. 

As a rule the nucleotide sequence of an inserted gene is "optimised" in two 

30 respects. The codon usage of the inserted gene is modified to approximate to the 

preferred codon usage of the species into which the gene is to be inserted. Inserted 
genes may also be optimised in respect of the nucleotide usage with the aim of 
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approximating the purine to pyrimidine ratio to that commonly found in the target 
species. When genes of bacterial origin are transferred to plants, for example, it is well 
known that the nucleotide usage has to be altered to avoid highly adenylated regions, 
common in bacterial genes, which may be misread by the eukaryotic expression 
5 machinery as a polyadenylation signal specifying termination of translation, resulting in 
truncation of the polypeptide. This is all common practice and is entirely logical that an 
inserted sequence should mimic the codon and nucleotide usage of the target organism 
for optimum expression. 

An object of the present invention is to provide means by which co-suppression 
10 may be obviated or mitigated. 

According to the present invention there is provided a method of enhancing 
expression of a selected protein by an organism having a gene which produces said 
protein, comprising inserting into a genome of the said organism a DNA the nucleotide 
sequence of which is such that the RNA produced on transcription is different from but 
15 the protein produced on translation is the same as that expressed by the gene already 
present in the genome. 

The invention also provides a gene construct comprising in sequence a 
promoter which is operable in a target organism, a coding region encoding a protein 
and a termination signal characterised in that the nucleotide sequence of the said 
20 construct is such that the RNA produced on transcription is different from but the 
protein produced on translation is the same as that expressed by the gene already 
present in the genome. 

The inserted sequence may have a constitutive promoter or a tissue or 
developmental preferential promoter. 
25 It is preferred that the promoter used in the inserted construct be different from 

that used by the gene already present in the target genome. However, our evidence 
suggests that it may be sufficient that the region between the transcription and 
translation initiation codons, sometimes referred to as the "5' intervening region ' , be 
different. In other words, the co-suppression phenomenon is probably associated with 
30 the transcription step of expression rather than the translation step: it occurs at the 
DNA or RNA levels or both. 
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The invention further provides transgenic plants having enhanced ability to 
express a selected gene and seed and propagating material derived from the said plant. 

This invention is of general applicability to the expression of genes but will be 
illustrated in one specific embodiment of our invention by a method of enhancing 
5 expression of the phytoene synthase gene which is necessary for the biosynthesis of 
carotenoids in plants, the said overexpression being achieved by the use of a modified 
transgene having a different nucleotide sequence from the endogenous sequence. 

Preferably said modified phytoene synthase gene has the sequence SEQ-ID- 

10 NO-1. 

The invention also provides a modified chloroplast targeting sequence 
comprising nucleotides 1 to 417 of SEQ-ID-NO 1 . 

In simple terms, our invention requires that protein expression be enhanced by 
inserting a gene construct which is altered, with respect to the gene already present in 

15 the genome, by maximising the dissimilarity of nucleotide usage while maintaining 
identity of the encoded protein. In other words, the concept is to express the same 
protein from genes which have different nucleotide sequences within their coding 
region and, preferably the promoter region as well. It is desirable to approximate the 
nucleotide usage (the purine to pyrimidine ratio) of the inserted gene to that of the 

20 gene already present in the genome. We also believe it to be desirable to avoid the use 
of codons in the inserted gene which are uncommon in the target organism and to 
approximate the overall codon usage to the reported codon usage for the target 
genome. 

The degree to which a sequence may be modified depends on the frequency of 
25 degenerate codons. In some instances a high proportion of changes may be made, 
particularly to the third nucleotide of a triplet, resulting in a low DNA (and 
consequently RNA) sequence homology between the inserted gene and the gene 
already present while in other cases, because of the presence of unique codons, the 
number of changes which are available may be low. The number of changes which are 
30 available can be determined readily by a study of the sequence of the gene which is 
already present in its degeneracy. 
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To obtain the gene for insertion in accordance with this invention it may be 
necessary to synthesise it. The general parameters within which the nucleotide 
sequence of the synthetic gene compared with the gene already present may be 
selected are: 

5 1 . Minimise the nucleotide sequence similarity between the synthetic gene and the 
gene already present in the plant genome; 

2. Maintain the identity of the protein encoded by the coding region; 

3. Maintain approximately the optimum codon usage indicated for the target 
genome; 

10 4. Maintain approximately the same ratio of purine to pyrimidine bases; and 
5. Change the promoter or, at least, the 5'-intervening region. 

We have worked with the phytoene synthase gene of tomato. The DNA 
sequence of the endogenous phytoene sequence is known (EMBL Accession Number 
Y00521): and it was discovered that this gene contained two sequencing errors toward 

15 the 3 1 end. These errors were corrected in the following way ( 1 ) cancel the cytosine at 
location 1365 and (2) insert a cytosine at 1421. The corrected phytoene synthase 
sequence (Bartley et al 1992), is given herein as SEQ-ID-NO-2. Beginning with that 
natural sequence we selected modifications according to the parameters quoted above 
and synthesised the modified gene which we designated MTOM5 and which has the 

20 sequence SEQ-ID-NO-1. Figure 1 herewith shows an alignment of the natural and 

synthesised gene with retained nucleotides indicated by dots and alterations by dashes. 
The modified gene MTOM5 has 63% homology at the DNA level, 100% at the protein 
level and the proportion of adenine plus thymidine (i.e. the purines) is 54% in the 
modified gene compared with 58% for the natural sequence. 

25 In the sequence listings provided herewith, SEQ ID NO 1 is the DNA sequence 

of the synthetic (modified TOMS) gene rewferred to as MTOM5 in Figure 1, SEQ ID 
NO 2 is the natural genomic phytoene synthase (Psyl) gene referred to as GTOM5 in 
Figure 1, and SEQ-ID NO 3 is the translation product of both GTOM5 and MTOM5. 
In tomato (Lycopersicon esculentum), it has been shown that the carotenoid 

30 namely lycopene, is primarily responsible for the red colouration of developing fruit 
(Bird et al 1991). The production of an enzyme phytoene synthase, referred to herein 
as Psyl, is an important catalyst in the production of phytoene, a precursor of 
lycopene. 
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Psyl catalyses the conversion of geranyl geranyl diphosphate to phytoene, the 
first dedicated step in carotenoid biosynthesis. 

The regulation and expression of the active Psyl gene is necessary for the 
production of lycopene and consequently the red colouration of fruit during ripening. 
5 This can be illustrated by the yellow flesh phenotype of tomato fruits observed in a 
naturally occurring mutant in which the Psyl gene is inactive. In addition transgenic 
plants containing an antisense Psyl transgene, which specifically down regulates Psyl 
expression have also produced the yellow flesh phenotype of the ripe fruit. 

When transgenic plants expressing another copy of the Psyl gene (referred to 
10 as TOM5) placed under the control of a constitutive promoter (being the Cauliflower 
Mosaic Virus 35S promoter) were produced, approximately 30% of the primary 
transformants produced mature yellow fruit indicative of the phenomenon of co- 
suppression. Although some of the primary transformants produced an increased 
caroteniod content, subsequent generations did not exhibit this phenotype thus 
15 providing evidence that co-suppression is not always immediate and can occur in 
future generations. 

The sequence of Psyl is known and hence the amino acid sequence was 
determined. 

With reference to published Genbank genetic sequence data (Ken-nosuke Wada 
20 et al 1992.), a synthetic DNA was produced by altering the nucleotide sequence to one 
which still had a reasonable frequency of codon use in tomato, and which retained the 
amino acid sequence of Psyl. A simple swap between codons was used in cases where 
there are only two codon options, however in other cases the codons were changed 
within the codon usage bias of tomato. Nucleotide sequence analysis indicated that the 
25 synthetic DNA has a nucleotide similarity with Psyl (TOM5 Hartley et al 1992) of 
63% and amino acid sequence similarity of 100%. 

The synthetic gene was then cloned into plant transformation vectors under the 
control of 35S promoter. These were then transferred into tomato plants by 
Agrobacterium transformation, and both the endogenous and the synthetic gene appear 
30 to express the protein. Analysis of the primary transformants illustrates there is no 

evidence, such as the production of yellow fruit, indicative of co-suppression between 
the two genes. 
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The present invention will now be described by way of illustration in the 
following examples. 

EXAMPLE 1 

The coding region of the cDNA which encodes tomato phytoene synthase, 
5 TOMS (EMBL accession number Y00521) was modified since the original sequence 
contained two errors towards the 3' end of the sequence. The sequence reported by 
Bartley et al 1992 (J Biol Chem 267:5036-5039) for TOM5 cDNA homologues 
therefore differs from TOMS (EMBL accession number Y00521). For the purpose of 
the production of the synthetic gene the sequence used is a corrected version of the 
10 TOMS cDNA which is identical to Psyl (Bartley et al 1992). 
Design of the sequence. 

1. Potential restriction endonuclease cleavage sites were considered given the 
constraints of the amino acid sequence. Useful sites around the predicted target 
sequence cleavage site were introduced to aid subsequent manipulation of the 

1 5 leader. 

2. A simple swap between codons was used in cases where there are only two 
codon options (eg. lysine). In other cases codons were changed within the 
codon usage bias of tomato as given by Ken-nosuke Wada et al (codon usage 
tabulated from GenBank genetic sequence data, 1992. Nucleic Acids research 

20 20:S21 1 1-21 18). A priority was given to reducing homology and avoiding 

uncommon codons rather than producing a representative spread of codon 
usage. 

3. A BamHI site was introduced at either end of the sequence to facilitate cloning 
into the initial. At the 5' end 4 A were placed upstream of the ATG according 

25 the dicot start site consensus sequence (Cavener and Ray 1991 , Eukaryotic 

start and stop translation sites. NAR 19: 3185-3192). 

4. The synthetic gene has been cloned into the vector pGEM4Z such that it can be 
translated using SP6. 

5. Restriction site, stemloop and codon usage analyses were performed, all results 
30 being satisfactory. 

6. The modified TOMS sequence was termed CGS48 or MTOM5. 
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Sequence analysis 



CGS48 



AT content = 54% 



TOM5 



AT content = 58% 



10 



15 



20 



25 



The nucleotide homology between TOM5 and CGS48 is 63%. 
Amino acid sequence homology is 100%. 

In summary the sequence TOM5 (Acc. No. YO0521) was extracted from the 
GenBank database and modified to incorporate the following corrections: deleted C at 
1365, inserted C at 1421. CGS48 is based on the CDS of the modified Y00521 and the 
original sequence, whilst retaining translation product homology and trying to maintain 
optima! tomato codon usage. 
Assembly of CGS48 
CGS48 was divided into three parts: 
CGS48A: BamHI / Kpnl 
CGS48B: KpnI/SacI 
CGS48C: Sad / BamHI 

All three were designed to be cloned on EcoRI / Hindlll fragments. The 
sequences were divided into oligonucleotide fragments following computer analysis to 
give unique complementarity in the overlapping regions used for the gene assembly. 

The oligonucleotides were synthesised on an Applied Biosystems 380B DNA 
synthesiser using standard cyanoethyl phosphoramidite chemistry. The oligonucleotides 
were gel purified and assembled into full length fragments using our own procedures. 

The assembled fragments were cloned into pUC18 via their EcoRI/Hindlll 
overhangs. 

Clones were sequenced bi-directionally using "forward*' and "reverse" 
sequencing primers together with the appropriate "build" primers for the top and 
bottom strands, using the dideoxy-mediated chain termination method for plasmid 



Inserts from correct CGS48A, B and C clones were isolated by digestion with 
BamHI / Kpnl, Kpnl / SacI, Sad / BamHI respectively. The Kpnl and SacI ends of the 
BamHI / Kpnl and SacI / BamHI fragments were phosphatased. All three fragments 
were co-ligated into BamHI cut and phosphatased pGEM4Z. Clones with the correct 
sized inserts oriented with the 5 1 end of the insert adjacent to the Smal site were 



DNA. 



■ w 
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identified by PCR amplification of isolated colonies and digestion of purified plasmid 
DNA with a selection of restriction enzymes. 

A CsCl purified plasmid DNA preparation was made from one of these clones. 
This clone (CGS48) was sequenced bi-directionally using "forward" and "reverse" 
5 sequencing primers together with the appropriate "build" primers for the top and 
bottom strands, using the dideoxy-mediated chain termination method for plasmid 
DNA. 

EXAMPLE 2 

Construction of the MTOM 5 vector with the CaMV 35S promoter 

10 The fragment MTOM5 (CGS48) DNA described in EXAMPLE 1 was cloned 

into the vector pJRIRi (Figure 2) to give the clone pRD13 (Figure 3). The clone 
CGS48 was digested with Smal and Xbal and then cloned into pJRIRi which was cut 
with Smal and Xbal to produce the clone pRD13 . 

EXAMPLE 3 

15 Generation and analysis of plants transformed with the vector pRD13 

The pRD13 vector was transferred to Agrobactcrium tumefaciens LBA4404 (a 
micro-organism widely available to plant biotechnologists) and used to transform 
tomato plants. Transformation of tomato stem segments followed standard protocols 
(e.g. Bird et al Plant Molecular Biology 11, 651-662, 1988). Transformed plants were 

20 identified by their ability to grow on media containing the antibiotic kanamycin. Forty 
nine individual plants were regenerated and grown to maturity. None of these plants 
produced fruit which changed colour to yellow rather than red when ripening. The 
presence of the pRD13 construct in all of the plants was confirmed by polymerase 
chain reaction analysis. DNA blot analysis on all plants indicated that the insert copy 

25 number was between one and seven. Northern blot analysis on fruit from one plant 

indicated that the MTOM5 gene was expressed. Six transformed plants were selfed to 
produce progeny. None of the progeny plants produced fruit which changed colour to 
yellow rather than red during ripening. 

The results are summarised in Table 1 below. The incidence of yellow, or 

30 mixed yellow/red (for example, striped) fruits is indicative of suppression of phytoene 
synthesis. Thus, with the normal GTOM5 construct, 28% of the transgenic plants 
displayed the co-suppressed phenotype. All the plants carrying the modified MTOM5 
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construct of this invention had red fruit demonstrating that no suppression of phytoene 
synthesis had occurred in any of them. 



TABLE 1 





Construct 


35S-GTOM5-nos 


35S-MTOM5-nos 


Total number of fruiting plants 


39 


49 


Number of plants producing yellow fruit 


8 


0 


Number of plants producing mixed yellow and 
red fruit or temporal changes 


3 


0 


Number of plants producing red fruit 


28 


49 


% plants showing co-suppression of psyl 


28% 


0% 



5 

FIGURE 1 



Sequence Alignment of Modified TOMS 
with the synthetic MTOM5 

ATG TCT GTT GCC TTG TTA TGG GTT GTT TCT 3 0 

ATG AGC GTG GCA CTT CTT TGG GTG GTG AGC 3 0 
MSVALLWVVS 

CCT TGT GAC GTC TCA AAT GGG ACA AGT TTC 6 0 

CCA TGC GAT GTG AGT AAC GGC ACT TCA TTT 6 0 
PCDVSNGTSF 

ATG GAA TCA GTC CGG GAG GGA AAC CGT TTT 9 0 

ATG GAG AGT GTG AGA GAA GGT AAT AGA TTC 9 0 
MESVREGNRF 

TTT GAT TCA TCG AGG CAT AGG AAT TTG GTG 12 0 

TTC GAC AGT TCT CGT CAC CGT AAC CTT GTT 12 0 
FDSSRHRNLV 

TCC AAT GAG AGA ATC AAT AGA GGT GGT GGA 150 

AGT AAC GAA CGT ATA AAC AGG GGA GGA GGT 150 
SNER XNRGGG 

AAG CAA ACT AAT AAT GGA CGG AAA TTT TCT 180 

AAA CAG ACA AAC AAC GGT AGA AAG TTC TCA 180 



TOMS 
MTOM5 

15 

TOMS 
MTOM5 

20 

TOM5 
MTOM5 

25 

TOM5 
MTOM5 

30 

TOM5 
MTOM5 

35 

TOM5 
MTOM5 
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TOM 5 
5 MTOM5 

TOMS 
10 MTOM5 

TOM 5 
15 MTOM5 

TOM5 
20 MTOM5 

TOMS 
25 MTOM5 

TOM5 
30 MTOM5 

TOM5 
35 MTOM5 

TOM5 
40 MTOM5 

TOMS 
45 MTOM5 

TOM5 
50 MTOM5 

TOMS 
55 MTOM5 

TOMS 



KQTNNGRKFS 

GTA CGG TCT GCT ATT TTG GCT ACT CCA TCT 210 

GTT AGA TCA GCA ATC CTT GCA ACA CCT AGC 210 
VRSAILATPS 

GGA GAA CGG ACG ATG ACA TCG GAA CAG ATG 240 

GGT GAG AGA ACT ATG ACT AGC GAG CAA ATG 24 0 
GERTMTSEQM 

GTC TAT GAT GTG GTT TTG AGG CAG GCA GCC 27 0 

GTG TAC GAC GTC GTA CTT CGT CAA GCT GCA 270 
VYDVVLRQAA 

TTG GTG AAG AGG CAA CTG AGA TCT ACC AAT 3 00 

CTA GTT AAA CGT CAG TTA CGT AGT ACT AAC 3 00 
LVKRQLRSTN 

GAG TTA GAA GTG AAG CCG GAT ATA CCT ATT 330 

GAA CTT GAG GTT AAA CCT GAC ATT CCA ATA 33 0 
ELEVKPDIPI 

CCG GGG AAT TTG GGC TTG TTG AGT GAA GCA 360 

CCT GGA AAC CTT GGA CTT CTT TCT GAG GCT 3 60 
PGNLGLLSEA 

TAT GAT AGG TGT GGT GAA GTA TGT GCA GAG 390 

TAC GAC AGA TGC GGA GAG GTT TGC GCA GAA 390 
YDRCGEVCAE 

TAT GCA AAG ACG TTT AAC TTA GGA ACT ATG 420 

TAC GCT AAA ACC TTC AAT TTG GGT ACC ATG 420 
YAKTFNLGTM 

CTA ATG ACT CCC GAG AGA AGA AGG GCT ATC 450 

TTG ATG ACA CCA GAA AGG CGT CGT GCA ATA 450 
LMTPERRRAI 

TGG GCA ATA TAT GTA TGG TGC AGA AGA ACA 4 80 

TGG GCT ATT TAC GTT TGG TGT AGG CGT ACT 4 80 
WAIYVWCRRT 

GAT GAA CTT GTT GAT GGC CCA AAC GCA TCA 510 

GAC GAG TTA GTG GAC GGA CCT AAT GCT AGT 510 
DELVDGPKAS 

TAT ATT ACC CCG GCA GCC TTA GAT AGG TGG 54 0 
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MTOM5 

5 TOMS 
MTOM5 

10 TOMS 
MTOM5 

15 TOMS 
MTOM5 

20 TOMS 
MTOM5 

25 TOMS 
MTOM5 

30 TOM5 
MTOM5 

35 TOMS 
MTOM5 

40 TOMS 
MTOM5 

45 TOMS 
MTOM5 

50 TOMS 
MTOM5 

55 

TOM5 



TAC ATA AC A CCC OCT GOT CTT GAC AGA TGG 540 
YITPAALDRW 

GAA AAT AGG CTA GAA GAT GTT TTC AAT GGG 57 0 



GAG AAC CGT TTG GAG GAC GTG TTT AAC GGC 57 0 
ENRLEDVFNG 

CGG CCA TTT GAC ATG CTC GAT GGT GCT TTG 600 



AGA CCT TTC GAT ATG TTG GAC GGA GCA CTT 60 0 
RPFDMLDGAL 

TCC GAT ACA GTT TCT AAC TTT CCA GTT GAT 630 



AGT GAC ACT GTG AGC AAT TTC CCT GTG GAC 630 
SDTVSNFPVD 

ATT CAG CCA TTC AGA GAT ATG ATT GAA GGA 660 



ATC CAA CCT TTT CGG GAC ATG ATC GAG GGC 660 
IQPFRDMIEG 

ATG CGT ATG GAC TTG AGA AAA TCG AGA TAC 69 0 



ATG AGA ATG GAT CTT CGT AAG TCT CGT TAT 690 
MRMDLRKSRY 

AAA AAC TTC GAC GAA CTA TAC CTT TAT TGT 720 



AAG AAT TTT GAT GAG TTG TAT TTG TAC TGC 720 
KNFDELYLYC 

TAT TAT GTT GCT GGT ACG GTT GGG TTG ATG 750 



TAC TAC GTG GCA GGA ACC GTG GGC CTT ATG 750 
YYVAGTVGLM 

AGT GTT CCA ATT ATG GGT ATC GCC CCT GAA 78 0 



TCA GTG CCT ATC ATG GGA ATT GCA CCA GAG 780 
SVPIMGIAPE 

TCA AAG GCA ACA ACA GAG AGC GTA TAT AAT 810 



AGT AAA GCT ACT ACT GAA TCT GTT TAC ACC 810 
SKATTESVYN 

GCT GCT TTG GCT CTG GGG ATC GCA AAT CAA 840 



GCA GCA CTA GCA TTA GGT ATA GCT AAC CAG 84 0 
AALALGIANQ 



TTA ACT AAC ATA CTC AGA GAT GTT GGA GAA 87 0 



MTOM5 



CTT ACA AAT ATC TTG AGG GAC GTG GGT GAG 87 0 
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TOMS 
5 MTOM5 

TOM5 
10 MTOM5 

TOMS 
1 5 MTOM5 

TOM5 
20 MTOM5 
TOMS 
MTOM5 



25 



30 



35 



40 



45 



50 



55 



TOMS 
MTOM5 

TOMS 
MTOM5 

TOM5 
MTOM5 

TOMS 
MTOM5 

TOMS 
MTOM5 

TOMS 
MTOM5 

TOMS 



L T N I 
GAT GCC AGA AGA 



L R D V G E 
GGA AGA GTC TAC TTG CCT 



GAC GCA CGT AGG 
D A R R 



GGT CGT GTG TAT CTC CCA 
G R V Y L P 



CAA GAT GAA TTA GCA CAG GCA GGT CTA TCC 



CAG GAC GAG CTC 
Q D E L 



GCT CAA GCT GGA TTG AGT 
A Q A G L S 



GAT GAA GAT ATA TTT GCT GGA AGG GTG ACC 



GAC GAG GAC ATT 

D E D I 

GAT AAA TGG AGA 

GAC AAG TGG AGG 

ATA CAT AGG GCA 



TTC GCA GGT CGT GTT ACA 

F A G R V T 

ATC TTT ATG AAG AAA CAA 

ATT TTC ATG AAA AAG CAG 

AGA AAG TTC TTT GAT GAG 



ATT CAC CGT GCT 
I H R A 



CGT AAA TTT TTC GAC GAA 
R K F F D E 



900 
900 

930 
930 

960 
960 

990 
990 
1020 
1020 



GCA GAG AAA GGC GTG ACA GAA TTG AGC TCA 1050 

1050 



GCT GAA AAG GGA 
A E K G 



GTT ACT GAG CTT TCT AGT 
V T E L S S 



GCT AGT AGA TTC CCT GTA TGG GCA TCT TTG 



GCA TCA AGG TTT 
A S R F 



CCA GTT TGG GCC AGC CTT 
P V W A S L 



GTC TTG TAC CGC AAA ATA CTA GAT GAG ATT 



GTG CTC TAT AGA 
V L Y R 



AAG ATT TTG GAC 
K I L D 



GAA ATC 
E X 



1080 
1080 

1110 
1110 



GAA GCC AAT GAC TAC AAC AAC TTC ACA AAG 1140 



GAG GCT AAC GAT 
E A N D 



TAT AAT AAT TTT 
Y N N F 



ACT AAA 
T K 



AGA GCA TAT GTG AGC AAA TCA AAG AAG TTG 



CGT GCT TAC GTT 
R A Y V 



TCT AAG AGC AAA AAA CTT 
S K S K K L 



ATT GCA TTA CCT ATT GCA TAT GCA AAA TCT 



ATC GCT CTT CCA 
I A L P 



ATC GCT TAC GCT 
I A Y A 



AAG AGC 
K S 



1170 
1170 

1200 
1200 



CTT GTG CCT CCT ACA AAA ACT GCC TCT CTT 123 0 
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MTOM5 TTG GTT CCA CCA ACT AAG AC A GCT AGC TTG 123 0 

LVPPTKTASL 



TOMS CAA AGA TAA 

MTOM5 CAG AGG TGA 

Q R * 



1239 
1239 



10 



. = Same Base 

-= Different Base 



15 



DNA SEQUENCE: 
PROTEIN SEQUENCE: 



63% HOMOLOGY 
100% HOMOLOGY 
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SEQUENCE LISTING 



( 1 } GENERAL INFORMATION : 

(i) APPLICANT: ZENECA LIMITED 
(ii) TITLE OF INVENTION: ENHANCEMENT OF GENE EXPRESSION 
(iii) NUMBER OF SEQUENCES: 3 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: IP DEPT., ZENECA AGROCHEMICALS , 

(B) STREET: JEALOTTS HILL RESEARCH STATION , 

(C) CITY: BRACKNELL, 

(D) STATE: BERKSHIRE 

(E) COUNTRY: GB 

(F) ZIP: RG42 6ET 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS -DOS 

<D) SOFTWARE: Patentln Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: WO NOT KNOWN 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: HUSKISSON, FRANK M 

(C) REFERENCE / DOCKET NUMBER: PPD 50156/WO 

(IX) TELECOMMUNICATION INFORMATION: 
(A) TELEPHONE: 01344 414822 

(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1239 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL* SOURCE: 

(C) INDIVIDUAL ISOLATE: SYNTHETIC DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

ATGAGCGTGG CACTTCTTTG GGTGGTGAGC CCATGCGATG TGAGTAACGG CACTTCATTT 
60 



ATGGAGAGTG TGAGAGAAGG TAATAGATTC TTCGACAGTT CTCGTCACCG TAACCTTGTT 
120 



AGTAACGAAC GTATAAACAG GGGAGGAGGT AAACAGACAA ACAACGGTAG AAAGTTCTCA 
180 



GTTAGATCAG CAATCCTTGC AAC AC CTAGC GGTGAGAGAA CTATGACTAG CGAGCAAATG 
240 



GTGTACGACG TCGTACTTCG TCAAGCTGCA C TAG TT AAAC GTCAGTTACG TAGTACTAAC 
300 

GAACTTGAGG TTAAACCTGA CATTCCAATA CC TGGAAAC C TTGGACTTCT TTCTGAGGCT 
360 

TACGACAGAT GCGGAGAGGT TTGCGCAGAA TACGCTAAAA CCTTCAATTT GGGTACCATG 
420 

TTGATGACAC CAGAAAGGCG TCGTGCAATA TGGGCTATTT ACGTTTGGTG TAGGCGTACT 
480 

GACGAGTTAG TGGACGGACC TAATGCTAGT TACATAACAC CCGCTGCTCT TGACAGATGG 
540 



GAGAACCGTT TGGAGGACGT GTTTAACGGC AGACCTTTCG ATATGTTGGA CGGAGCACT' 
600 
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AGTGACACTG TGAGCAATTT CCCTGTGGAC ATCCAACCTT TTCGGGACAT GATCGAGGGC 
660 

ATGAGAATGG ATCTTCGTAA GTCTCGTTAT AAGAATTTTG ATGAGTTGTA TTTGTACTGC 
720 

TACTACGTGG CAGGAACCGT GGGCCTTATG TCAGTGCCTA TCATGGGAAT TGCACCAGAG 
780 

AGTAAAGCTA CTACTGAATC TGTTTACACC GCAGCACTAG CATTAGGTAT AGCTAACCAG 
840 

CTTACAAATA TCTTGAGGGA CGTGGGTGAG GACGCACGTA GGGGTCGTGT GTATCTCCCA 
900 

CAGGACGAGC TCGCTCAAGC TGGATTGAGT GACGAGGACA TTTTCGCAGG TCGTGTTACA 
960 

GACAAGTGGA GGATTTTCAT GAAAAAGCAG ATTCACCGTG CTCGTAAATT TTTCGACGAA 
1020 

GCTGAAAAGG GAGTTACTGA GCTTTCTAGT GCATCAAGGT TTCCAGTTTG GGCCAGCCTT 
1080 

GTGCTCTATA GAAAGATTTT GGACGAAATC GAGGCTAACG ATTATAATAA TTTTAC T AAA 
1140 

CGTGCTTACG TTTCTAAGAG CAAAAAACTT ATCGCTCTTC CAATCGCTTA CGCTAAGAGC 
1200 

TTGGTTCCAC CAACTAAGAC AGCTAGCTTG CAGAGGTGA 
1239 

(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 12 3 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

5 

(vi) ORIGINAL SOURCE : 

(A) ORGANISM: LYOPERSICON ESCULENTUM (TOMATO) 

(vii) IMMEDIATE SOURCE: 
10 (B) CLONE: GTOM5 - PHYTOENE SYNTHASE GENE 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

15 ATGTC TGTTG CCTTGTTATG GGTTGTTTCT CCTTGTGACG TCTCAAATGG GACAAGTTTC 
60 

ATGGAATCAG TCCGGGKGGG AAACCGTTTT TTTGATTCAT CGAGGCATAG GAATTTGGTG 
120 

20 

TCCAATGAGA GAATCAATAG AGGTGGTGGA AAGCAAACTA ATAATGGACG GAAATTTTCT 
180 

GTACGGTCTG CTATTTTGGC TACTCCATCT GGAGAACGGA CGATGACATC GGAACAGATG 
25 240 

GTCTATGATG TGGTTTTGAG GCAGGCAGCC TTGGTGAAGA GGC AACTGAG ATCTACCAAT 
300 

30 GAGTTAGAAG TGAAGCCGGA TATACCTATT CCGGGGAATT TGGGCTTGTT GAGTGAAGCA 
360 

TATGATAGGT GTGGTGAAGT ATGTGCAGAG TATGCAAAGA CGTTTAACTT AGGAACTATG 
420 

35 

CTAATGACTC CCGAGAGAAG AAGGGCTATC TGGGCAATAT ATGTATGGTG CAGAAGAACA 
480 

GATGAACTTG TTGATGGCCC AAACGCATCA TATATTACCC CGGCAGCCTT AGATAGGTGG 
40 540 
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10 



15 



20 



25 



30 



35 



40 



GAAAATAGGC TAGAAGATGT TTTCAATGGG CGGCCATTTG ACATGCTCGA TGGTGCTTTG 
600 

TC CG AT AC AG TTTCTAACTT TCCAGTTGAT ATTCAGCCAT TCAGAGATAT GATTGAAGGA 
660 

ATGCGTATGG ACTTGAGAAA ATCGAGATAC AAAAACTTCG ACGAACTATA CCTTT ATTGT 
720 

TATTATGTTG CTGGTACGGT TGGGTTGATG AGTGTTCCAA TTATGGGTAT CGCCCCTGAA 
780 

TCAAAGGCAA CAACAGAGAG CGTATATAAT GCTGCTTTGG CTCTGGGGAT CGCAAATCAA 
840 

TTAACTAACA TACTCAGAGA TGTTGGAGAA GATGCCAGAA GAGGAAGAGT CTACTTGCCT 
900 

CAAGATGAAT TAGCACAGGC AGGTCTATCC GATGAAGATA TATTTGCTGG AAGGGTGACC 
960 

GATAAATGGA GAATCTTTAT GAAGAAACAA ATACATAGGG CAAGAAAGTT CTTTGATGAG 
1020 

GCAGAGAAAG GCGTGACAGA ATTGAGCTCA GCTAGTAGAT TCCCTGTATG GGCATCTTTG 
1080 

GTCTTGTACC GCAAAATACT AGATGAGATT GAAGCCAATG ACTACAACAA CTTCACAAAG 
1140 

AGAGCATATG TGAGCAAATC AAAGAAGTTG ATTGCATTAC CTATTGCATA TGCAAAATCT 
1200 

CTTGTGCCTC CTACAAAAAC TGCCTCTCTT CAAAGATAA 
1239 

(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 02 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : unknown 
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(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

5 (iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: LYOPERSICONN ESCULENTUM (TOMATO) 

10 (vii) IMMEDIATE SOURCE: 

(A) LIBRARY : TRANSLATION PRODUCT OF GTOM5 AND MTOM5 



15 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

Met Ser Val Ala Leu Leu Trp Val Val Ser Pro Cys Asp Val Ser Asn 
15 10 15 

Gly Thr Ser Phe Met Glu Ser Val Arg Glu Gly Asn Arg Phe Phe Asp 
20 20 25 30 

Ser Ser Arg His Arg Asn Leu Val Ser Asn Glu Arg lie Asn Arg Gly 
35 40 45 

25 Gly Gly Lys Gin Thr Asn Asn Gly Arg Lys Phe Ser Val Arg Ser Ala 

50 55 60 

lie Leu Ala Thr Pro Ser Gly Glu Arg Thr Met Thr Ser Glu Gin Met 

65 70 75 80 



30 



Val Tyr Asp Val Val Leu Arg Gin Ala Ala Leu Val Lys Arg Gin Leu 

85 90 95 



Arg Ser Thr Asn Glu Leu Glu Val Lys Pro Asp lie Pro lie Pro Gly 
35 100 105 110 

Asn Leu Gly Leu Leu Ser Glu Ala Tyr Asp Arg Cys Gly Glu Val Cys 

115 120 125 

40 Ala Glu Tyr Ala Lys Thr Phe Asn Leu Gly Thr Met Leu Met Thr Pro 

130 135 140 
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Glu Arg Arg Arg Ala lie Trp Ala lie Tyr Val Trp Cys Arg Arg Thr 
145 15C 155 160 

Asp Glu Leu Val Asp Gly Pro Asr, Ala Ser Tyr lie Thr Pro Ala Ala 
5 165 170 175 

Leu Asp Arg Trp Glu Asr. Arg Leu Glu Asp Val Phe Asn Gly Arg Pro 
180 185 190 

10 Phe Asp Met Leu Asp Gly Ala Leu Ser Asp Thr Val Ser Asn Phe Pro 

195 200 205 

Val Asp lie Gin Pro Phe Arg Asp Met lie Glu Gly Met Arg Met Asp 
210 215 220 

15 

Leu Arg Lys Ser Arg Tyr Lys Asn Phe Asp Glu Leu Tyr Leu Tyr Cys 
225 230 235 240 

Tyr Tyr Val Ala Gly Thr Val Gly Leu Met Ser Val Pro lie Met Gly 
20 245 250 255 

lie Ala Pro Glu Ser Lys Ala Thr Thr Glu Ser Val Tyr Asn Ala Ala 
260 265 270 

25 Leu Ala Leu Gly lie Ala Asn Gin Leu Thr Asn lie Leu Arg Asp Val 

275 280 285 

Gly Glu Asp Ala Arg Arg Gly Arg Val Tyr Leu Pro Gin Asp Glu Leu 
290 295 300 



30 

Ala Gin Ala Gly Leu Ser Asp Glu Asp He Phe Ala Gly Arg Val Thr 
305 310 315 320 

He His Arg Ala Arg Lys Phe Phe Asp Glu Ala Glu Lys Gly Val Thr 

35 325 330 335 

Glu Leu Ser Ser Ala Ser Arg Phe Pro Val Trp Ala Ser Leu Val Leu 
340 345 350 

40 Tyr Arg Lys He Leu Asp Glu He Glu Ala Asn Asp Tyr Asn Asn Phe 

355 360 365 
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Thr Lys Arg Ala Tyr Val Ser Lys Ser Lys Lys Leu lie Ala Leu Pro 
370 375 380 

lie Ala Tyr Ala Lys Ser Leu Val Pro Pro Thr Lys Thr Ala Ser Leu 
5 385 390 395 400 

Gin Arg 
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CLAIMS 

1 . A method of enhancing expression of a selected protein by an organism having 
a gene which produces said protein, comprising inserting into the genome of 
the said organism a DNA the nucleotide sequence of which is such that the 
RNA produced on transcription is different from but the protein produced on 
translation is the same as that expressed by the gene already present in the 
genome. 

2. A method as claimed in claim 1, in which the organism is a plant. 

3. A method as claimed in claim 2, in which the plant is a tomato plant. 

4. A method as claimed in any preceding claim, in which the selected gene is the 
gene encoding phytoene synthase. 

5. A method as claimed in claim 4, in which the coding region of the said inserted 
gene has the sequence SEQ-ID-NOl. 

6. A gene construct comprising in sequence a promoter which is operable in a 
target organism, a coding region encoding a protein and a termination signal 
characterised in that the nucleotide sequence of the said construct is such that 
the RNA produced on transcription is different from but the protein produced 
on translation is the same as that expressed by the gene already present in the 
genome. 

7. A method of enhancing expression of caroteniods in a plant comprising 
overexpression in the plant a gene specifying an enzyme necessary to the 
biosynthesis of carotenoids, the said overexpression being achieved by the use 
of a modified transgene having a different nucleotide sequence from the 
endogenous sequence. 
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8. A method as claimed in claim 7, in which the modified gene specifies phytoene 
synthase. 

9. A modified chloroplast targeting sequence comprising nucleotides 1 to 417 of 
SEQ-ID-NO-1 
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